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Abstract 

To  verify  compliance  with  a  Comprehensive  Test  Ban  Treaty  (CTBT),  low  energy  seismic  activity  must 
be  detected  and  discriminated.  Monitoring  small-scale  seismic  activity  will  require  regional  monitoring 
capabilities  (within  «  2000  km,  U.S.  Congress  (1988)).  The  reliable  discrimination  of  small-scale  seismic 
events  requires  a  multi-dimensional  representation  of  the  seismic  signal.  A  multi-dimensional  cheiraicterization 
might  include  wave  eirrival  times,  magnitudes,  and  incidence  and  azimuth  eingles.  These  measurements  cam 
be  used  singly  or  combined  to  form  discriminants,  which  are  then  subjected  to  a  set  of  discrimination 
rules  to  categorize  the  source  event.  Statistical  discrimination  methods  of  this  type  require  a  training 
sample,  i.e.,  a  set  of  read  or  simulated  seismic  data  used  to  optimize  or  tune  the  discrimination  algorithm  by 
assigning  weights  to  the  various  discriminamts,  or  by  modifying  the  structure  of  the  aJgorithm.  Identifying 
the  signatures  of  various  seismic  sources  adso  requires  a  geologic  characterization  of  the  shallow  structure 
of  the  eaurth  in  each  particular  region  of  interest.  The  results  can  be  used  to  construct  seismic  signails 
representing  nuclear  test  sources.  These  synthetic  or  simulated  signals  cam  be  combined  with  empiricail 
signads  of  earthquakes  and  mining  activities  to  form  a  training  sample  for  each  region.  This  paper  identifies 
several  statisticad  issues  that  must  be  resolved  in  order  to  aiddress  the  CTBT  verification  mission.  These  are 
all  associated  with  uncertainties  in  the  multidimensional  characterization  measurements  or  in  the  correlations 
aunong  them.  In  particular,  further  research  is  needed  on  the  statistical  properties  of: 

•  wave  arrival  time  estimates,  especially  for  regionad  wave  airrivals,  which  sometimes  tend  to  be  emergent; 

•  regional  velocity  tables,  i.e.,  the  travel  time  tables  that  characterize  the  regionad  geology; 

•  measurements  from  regional  seismic  arrays,  which  cam  theoretically  be  combined  to  provide  better 
estimates  of  wave  aurrival  times,  magnitudes,  and  direction; 

•  evasion  scenarios  (see  U.S.  Congress,  1988,  Chapter  6); 

•  association,  i.e.,  the  agreement  among  seismic  stations  or  between  globad  and  regionad  seismic  networks 
that  a  seismic  event  has  occurred  amd  how  to  classify  it; 

•  training  saunples,  especially  how  to  eliminate  bias  in  the  saunple; 

•  robust  discrimination  adgorithms,  e.g,,  algorithms  that  are  less  sensitive  to  data  with  a  poor  signad-to- 
noise  ratio; 

•  Bayesiam  discrimination  algorithms  e.g.,  algorithms  that  utilize  expert  opinion  to  substitute  for  missing 
data; 

•  statistical  interdependencies  in  a  regionad  seismic  analysis,  e.g.,  the  relationship  between  the  uncer- 
tadnties  in  detection,  phase  identification,  event  association  and  discrimination. 

Also  discussed  are  several  common  statistical  discrimination  methods  including  linear  discrimination,  clas¬ 
sification  amd  regression  trees  (CART)  and  logistic  regression. 
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Objective 

To  meet  the  exacting  demands  of  monitoring  a  CTBT,  tools  that  integrate  both  seismic  and  statistical 
technologies  are  needed.  Discriminating  small  seismic  events  in  a  region  of  interest  requires  a  geological 
characterization  of  the  crust  of  the  earth  in  that  region.  This  characterization  leads  to  an  understanding  of 
the  unique  seismic  signatures  that  will  be  produced  by  different  seismic  sources.  Characterization  of  regions 
of  interest  to  the  United  States  (U.S.)  is  currently  being  studied  as  a  part  of  the  U.S.  Department  of  Energy 
(DOE)  CTBT  R  &  D  program  (DOE,  1994).  The  findings  from  this  research  could  conceivably  be  used 
to  construct  synthetic  or  simulated  seismic  signals  representing  the  behavior  of  a  nuclear  source.  These 
simulated  signals  cam  be  combined  with  empirical  signals  from  eeirthquakes  and  mining  eictivities  to  form  a 
training  sample  for  the  region.  Discrimination  for  low  energy  seismic  events  will  require  a  multi-dimensional 
representation  of  a  seismic  signal.  Multi-dimensional  measurements  from  a  seismic  signal  (discriminants)  can 
be  combined  by  a  variety  of  statistical  techniques  to  form  a  unified  discrimination  method.  With  a  treuning 
sample,  a  statistical  discrimination  technique  is  trained  to  optimally  combine  discriminants.  For  example,  a 
discrimination  rule  might  make  use  of  a  sum  of  weighted  discriminants  with  the  weights  estimated  from  a 
regional  training  sample.  Several  common  statistical  discrimination  methods  2ure: 

•  linear  discrimination,  including  Fisher’s  linear  discriminamt  function 

•  quadratic  discrimination 

•  nonparametric  discrimination,  including  nonparametric  likelihood  methods  and  kth  nearest-neighbor 
methods 

•  classification  and  regression  trees  (CART) 

•  logistic  regression. 

The  objective  of  this  research  is  to  identify  and  resolve  the  statistical  issues  associated  with  monitoring 
a  CTBT  and  to  identify  and  research  various  statistical  discrimination  methods  appropriate  for  regional 
seismic  discrimination. 


Preliminary  Research  Results 

Statistical  Issues  in  Seismic  Analysis 

Sever2il  general  statistical  issues  need  to  be  resolved  in  order  to  effectively  verify  a  CTBT.  Addressing  them 
will  undoubtedly  uncover  more  detailed  statistical  questions.  These  issues  include: 

•  Statistical  properties  of  wave  arrival  time  estimates — Arrival  times  of  various  seismic  waves  can  be 
estimated  in  several  ways,  with  associated  uncertainties.  Hypocenter  estimation  techniques  axe  based 
on  arrived  times  of  different  types  of  waves  from  a  seismic  disturbemce.  Estimates  of  depth  and  epicenter 
eire  used  as  discrimination  tools.  To  ascribe  uncertainty  to  a  hypocenter  estimate,  the  imcertainty  in  a 
wave  arrived  time  estimate  must  be  resolved.  The  seismic  community  is  fully  aweae  of  the  problem  of 
ascribing  uncertainty  to  a  wave  arrived  time  estimate  in  a  teleseismic  setting.  This  issue  will  need  to 
be  revisited  in  a  regional  setting  because  regional  wave  arriveds  often  exhibit  a  gradual  tremsition  from 
noise  to  wave  signal,  i.e.,  they  eire  emergent. 

•  Statistical  properties  of  regional  travel  time  tables — To  develop  a  U.S.  CTBT  seismic  monitoring 
system,  regioned  travel  time  tables  will  need  to  be  developed.  At  regioned  distances,  small  variations  in 
the  velocity  structure  of  the  earth  can  have  a  significemt  impact  on  location  and  source  characterization. 
To  construct  an  uncertainty  statement  for  a  regional  hypocenter  estimate,  the  uncertainty  in  a  regional 
travel  time  table  will  need  to  be  combined  with  wave-arrival  time  uncertainties.  Travel  time  tables 
provide  the  single  largest  source  of  systematic  error  in  a  hypocenter  estimate. 
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•  Statistical  properties  of  seismic  measurements  from  arrays — The  proposed  CTBT  monitoring  system 
will  use  seismic  arrays  to  monitor  regions  of  interest.  The  statistical  properties  of  teleseismic  array 
data  are  well  understood  in  the  seismic  community.  The  statistical  issues  associated  with  using  array 
data  in  regional  seismic  einalysis  will  need  to  be  researched.  Regional  seismic  array  data  will  be 
combined  to  estimate  wave  arrival  times.  The  uncertainty  in  this  process  will  need  to  be  evaluated. 
Also,  calculating  wave  magnitudes  from  regional  array  data  will  form  the  foundation  for  some  regioned 
discriminants.  The  statistical  properties  of  regional  array  magnitudes  and  the  discriminative  power  of 
these  magnitudes  should  be  researched. 

•  Statistical  issues  of  evasion  scenarios — The  U.S.  Congress  (1988,  Chapter  6)  Office  of  Technology 
Assessment  discusses  several  viable  evasion  scenarios  that  need  to  be  addressed  to  effectively  implement 
a  CTBT.  One  of  these  scenarios  involves  masking  or  decoupling  the  energy  release  from  a  nuclear 
weapon  test  by  performing  the  test  in  an  open  underground  cavity.  The  parameters  (decoupling 
factors)  associated  with  an  evasive  decoupled  weapon  test  need  to  be  estimated.  As  noted  in  the 
congressional  report,  decoupling  factors  are  needed  to  establish  CTBT  monitoring  thresholds.  The 
statistical  properties  of  decoupling  factors  for  various  geologies  eind  test  cavity  configurations  should 
be  researched  in  order  to  effectively  establish  CTBT  monitoring  thresholds. 

•  Statistical  properties  of  association — Errors  in  the  association  process  often  leawl  to  the  cataloging  of 
spurious  events,  or  the  degradation  of  the  accuracy  of  seismic  event  locations.  A  quantitative  measure  of 
the  strength  of  association  is  necessary.  The  creation  of  such  a  measure  is  a  statistical  challenge  because 
it  would  have  to  combine  measures  of  the  similarity  of  the  associated  waveforms,  the  agreement  of  the 
back  azimuths  and  slowness,  and  the  agreement  of  the  phase  arrival  times,  all  weighted  by  some  type 
of  observed  signal-to-noise  ratio.  The  measure  must  include  penalties  for  missing  data;  for  exjunple,  an 
analyst  may  wonder  why  a  high  quality,  low  noise  station,  which  usually  detects  events  from  the  area 
in  question,  did  not  detect  it.  If  the  station  is  operational,  such  non-data  is  strong  evidence  agednst  the 
event  being  a  true  event.  The  problem  of  misassociation  of  individual  events  aside,  another  statistical 
issue  is  the  estimation  of  the  overall  rate  of  misassociations  reported  in  seismic  bulletins  and  catalogs 
of  events.  Perhaps  region-by-region  rates  of  misassociation  cem  be  estimated  by  comparison  of  global 
seismic  catalogs  with  the  seismic  catalogs  from  regional  networks.  Such  regional  networks  may  provide 
more  complete  catalogs  of  the  region’s  events;  i.e.,  the  region’s  near-ground-truth. 

•  Statistical  issues  in  constructing  a  training  sample — Any  regioned  discrimination  process  will  need  a 
training  sample  to  build  individued  discriminant  weights  and  possibly  the  structure  of  the  discrimination 
algorithm.  A  proper  regional  training  sample  will  be  similar  in  every  respect  to  data  that  would 
be  analyzed  in  an  operational  setting.  A  training  seunple  must  not  be  biased  by  the  elimination  of 
information.  Using  only  “clean”  events  in  a  tredning  sample  will  seriously  misrepresent  misclassification 
rates  and  imcertainties  in  a  discrimination  algorithm.  A  tredning  saunple  cem  further  misrepresent  these 
misclassification  rates  and  uncertainties  if  the  size  of  an  event  is  confounded  with  the  source  of  the 
event;  e.g.,  if  all  large  magnitude  events  in  a  training  sample  are  earthquakes  (or  conversely  if  all  are 
explosions).  Finally,  a  training  sample  may  include  designed  nuclear  weapon  tests  or  mining  explosions 
(calibration  events)  in  regions  of  interest.  Statisticad  experimental  design  techniques  can  contribute  to 
an  optimally  designed  calibration  event. 

•  Robustness  of  statisticad  discrinunation  adgorithms — ^The  ability  of  a  statistical  discrimination  method 
to  accurately  perform  under  a  less  than  optimad  operational  setting  (its  robustness)  should  be  re¬ 
searched.  The  statistical  discrimination  methods  discussed  below  should  be  considered  when  develop¬ 
ing  a  U.S.  CTBT  seismic  monitoring  system  because  the  robustness  properties  of  these  methods  can 
be  readily  studied.  These  methods  can  be  synergisticadly  used  as  evidence  when  identifying  the  source 
of  a  seismic  event.  Complementary  to  research  on  the  robustness  of  statistical  discrimination  methods, 
there  should  be  reseaurch  on  statistical  methods  to  address  missing  data.  It  is  conceivable  that  it  may 
not  be  possible  to  construct  an  appropriate  training  sample  for  a  future  region  of  interest.  Expert 
opinion  may  be  required  to  construct  an  initiad  discrimination  algorithm  for  such  a  region.  In  the 
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statistical  community,  methods  of  integrating  expert  opinion  into  a  statistical  methodology  are  known 
as  Bayesian  methods.  Determining  if  a  statistical  discrimination  technique  is  amenable  to  Bayesian 
methods  should  be  an  important  CTBT  research  task. 

•  Statistical  interdependencies  in  a  seismic  analysis — All  aspects  of  a  regional  seismic  analysis  are  related 
or  interdependent.  These  interdependencies  will  most  likely  produce  statistical  correlations  that  must 
be  addressed.  For  example,  a  hypocenter  estimate  is  used  in  forming  some  seisnoic  discriminants. 
Understanding  the  correlations  between  various  seismic  measurements  and  sub-analyses  is  necessaury 
to  develop  general  uncertainty  statements. 

Statistical  Techniques  for  Seismic  Discrimination 

In  a  seismological  setting,  statistical  discrimination  is  the  process  of  classifying  a  C2uididate  seismic  event  as 
an  eeirthquake,  a  chemical  explosion,  or  a  nuclear  detonation  using  information  from  seismic  discriminants 
(variables  containing  information  derived  from  a  seismic  waveform).  For  a  lucid  discussion  of  potential  re¬ 
gional  seismic  discriminants  see  Blandford  (1995).  The  goal  of  discriminant  analysis  is  not  only  to  identify 
important  or  relevant  discriminants  but  also  to  design  a  procedure  incorporating  these  discriminants  that 
accurately  classifies  the  source  of  a  seismic  disturbance.  In  this  section,  some  basic  statistical  multivariate 
discrimination  methods  are  reviewed.  Examples  that  illustrate  each  of  these  statistical  discrimination  meth¬ 
ods  are  included.  The  data  used  in  these  examples  were  collected  by  Walter,  Mayeda,  and  Patton  (1994). 
It  is  important  to  remember  that  these  examples  are  not  intended  to  be  an  authoritative  seismic  analysis  of 
these  data.  Rather,  the  goal  is  to  use  data  with  seismic  characteristics  to  illustrate  the  features  of  statistical 
discrimination  methods.  When  presented  with  these  examples,  the  reader  should  focus  on  the  potential  utility 
of  the  statistical  discrimination  methods  and  not  the  specific  inferences  from  this  small  data  set. 

For  a  seismic  event,  a  vector  of  p  discriminants,  x  =  (ii, . . . ,  Xp)',  is  measured  or  derived  from  a  seismic 
waveform.  The  vector  x  might  include  wave  arrival  times,  magnitudes,  incidence  and  azimuth  angles  and 
other  potential  discriminants.  Note  that  these  cUscriminauats,  x,-,  i  =  1,  ...,p,  can  take  on  any  vadue,  real 
(e.g.,  focaJ  depth)  or  categorical  (e.g.,  polarity  of  first  motion).  A  classifier  or  discrimination  rule  is  defined 
as  a  function  d(x)  that  mathematicadly  combines  the  discriminamts  in  x.  The  value  of  the  function  d(x) 
indicates  the  most  likely  source  of  a  seismic  event.  An  alternative  formulation  of  the  discrimination  rule  is 
to  consider  the  vector  of  discriminants,  x,  as  a  point  in  a  p-dimensional  space.  The  discrimination  rule  d(x) 
can  then  be  thought  of  as  partitioning  or  dividing  this  p-dimensional  space  into  sections.  Eaudi  section  would 
then  be  associated  with  a  particular  seismic  source.  For  exatmple,  if  a  single  read  discriminatnt  is  considered, 
then  d(x)  represents  a  “cut”  on  that  discriminant,  dividing  the  read  line  into  “right”  amd  “left”  sections.  All 
caindidate  events  with  vadues  of  this  discriminant  to  the  left  of  the  cut  could  be  labeled  as  explosions,  while 
adl  vadues  to  the  right  could  be  labeled  as  earthquakes.  If  two  discriminauats  are  considered,  then  d(x)  is  a  line 
dividing  the  plame  of  read  numbers  into  “left”  and  “right”  areas.  If  three  discriminamts  aire  considered,  then 
d(x)  is  a  plane  slicing  through  three-dimensional  space.  Note  that  more  complex  rules  aure  not  restricted  to 
a  single  partition,  nor  are  the  partition  boundaries  adways  straight  lines. 

The  error  involved  in  a  classification  scheme  is  governed  by  the  rule  that  partitions  the  relevant  multi- 
vairiable  space.  Some  insight  into  the  sources  amd  behavior  of  the  misciassification  error  cam  be  found  by 
studying  how  a  classification  rule  partitions  the  space  of  possible  discriminamt  vadues.  Let  the  discriminamts 
X  for  a  paurticulair  seismic  source  be  generated  from  a  probability  model  (distribution)  that  is  distinct  from 
the  probability  model  describing  another  seismic  source.  A  classification  rule  divides  the  vauriable  space 
into  sections,  with  each  section  representing  a  seismic  source.  The  probability  of  misciassification  is  simply 
the  probability  of  madaing  an  incorrect  classification.  The  totad  misciassification  probability  is  a  sum  of  the 
individuad  source  misciassification  probabilities. 

Discrimination  rules  are  constructed  based  on  past  experience  in  the  form  of  a  training  sample.  A  training 
sample  is  a  set  of  discriminamt  vectors  x  with  known  classification  that  is  representative  of  the  distribution 
of  the  seismic  sources.  This  set  of  data  is  used  to  “train”  the  discrimination  rule.  A  tradning  sample  is  used 
to  build  a  discrimination  rule  amd  to  test  its  performance  or  aiccuracy  with  cross-validation  methods.  The 
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performance  of  a  discrimination  rule  is  generally  ascertained  through  some  measure  of  misclassihcation  cost. 
For  example,  Taylor  et.al.  (1989)  and  Glaser  et.al.  (1986)  discuss  the  cost  function 

C'(*l9)’rg-P(a;|9)  +  C{q\x)i:^P{q\x),  (1) 

where  tt,  and  tt*  aure  the  prior  probability  of  an  earthquake  and  explosion,  C{x\q)  and  C{q\x)  are  the  costs 
or  peneilties  associated  with  mislabeling  an  explosion  as  an  earthquake  or  an  earthquake  as  an  explosion, 
and  P(x|g)  emd  P(g|x)  axe  the  misclassihcation  probabilities. 

Such  a  criterion  zdlows  considerable  flexibility  to  account  for  a  variety  of  situations.  For  example,  in  a 
CTBT  setting,  the  relative  frequency  of  nuclear  explosions  versus  earthquakes  should  be  quite  small.  Hence, 
TTj,  should  be  set  quite  smadl  relative  to  iTq  in  a  CTBT  setting.  Perhaps  more  importemt  is  the  cost  associated 
with  the  diflferent  types  of  error.  A  false  alarm  (labeling  an  event  as  am  explosion  when  in  fact  it  is  am 
earthquake)  may  be  thought  of  as  less  serious  than  the  failure  to  detect  a  violation  of  the  CTBT  (labeling 
am  event  as  an  earthquake  when  in  fact  it  is  an  explosion).  Hence,  the  cost  associated  with  failure  to  detect 
a  violation,  C{q\x),  would  be  set  higher  than  the  cost  of  a  false  adaxm,  C(x\q).  The  probabilities  in  Equation 
(1)  are  generally  unknown  quamtities,  but  they  can  easily  be  estimated  with  cross-validation  methods. 

Linear  Discrimination 

One  of  the  most  conceptually  simple  rules,  linear  discrimination,  is  based  on  the  assumption  that  the  sources 
exhibit  Gaussiam  distributions  with  identical  covariance  structure  (i.e.,  differing  only  in  location).  A  linear 
discrimination  rule  assigns  a  camdidate  event  to  the  source  with  centroid  closest  to  the  position  of  the  x  in 
the  saunple  space.  Many  distance  metrics  are  possible,  but  the  most  natural  is  the  Mahalamobis  distance, 
using  the  pooled  within-group  sample  vauiances  (an  unbiased  estimator  of  the  common  covairiamce  matrix 
of  the  groups).  For  example,  consider  two  sources,  eairthquake  and  nuclear  detonation  (NUDET).  Then  the 
estimated  covariamce  is  written  as  s  =  (n*Sr-|-n9S,)/(njr+n,— 2),  where  s*  and  s,  are  the  samiple  covauriamces 
estimated  from  the  training  sample  and  n*  and  n,  are  the  saunple  sizes  of  the  training  saunple  for  the  two 
sources.  A  new  observation  x  is  labeled  as  a  NUDET  if  (x,.  —  3c,)s~^(x  -  |(xs  -|-  x,))  >  0,  where  x®  and  x, 
are  the  means  of  the  training  sample  for  the  two  sources.  A  quauiratic  discrimination  rule  is  possible  if  the 
covariances  are  not  equal. 

Nonparametric  Discrimination 

The  classical  approau:h  to  statisticad  discrimination  involves  assuming  a  parametric  form  for  the  probability 
distribution  of  each  group  and  using  a  training  sample  to  estimate  the  relevamt  pau’ameters.  A  candidate 
event  is  then  classified  to  the  group  with  the  largest  likelihood.  For  example,  one  might  choose  the  Gaussian 
distribution  to  model  the  earthquakes.  A  training  sample  would  then  be  used  to  estimate  the  mean  and 
covariance.  The  distribution  for  NUDETs  would  be  handled  similauly.  A  candidate  event  would  then  be 
classified  as  am  earthquake  if  /,(x;x,,s^)  is  greater  than  /®(x;x®,sj)  or  as  a  NUDET  otherwise.  Here,  fq 
and  fx  are  the  eaurthquake  and  NUDET  distributions  with  paurameters  x,,  x®,  s,,  and  s^  estimated  from 
training  data. 

Hauid  (1981)  amd  Silverman  (1986)  have  studied  the  use  of  nonparametric  methods  for  use  in  classification 
problems,  replacing  the  pauaunetric  probability  models  in  the  classicaJ  procedure  with  nonparametric  density 
estimates.  Exaunples  of  nonparametric  density  estimates  include  the  histogram  and  the  kernel  estimator. 
Recent  advances  in  multivariate  probability  density  estimation  (see  Scott  (1992))  have  led  to  further  work  in 
nonparaunetric  methods  for  discrimination.  Hall  and  Wamd  (1988)  study  the  use  of  nonparametric  methods 
with  probability  model  differences  as  a  discrimination  tool.  Holmstrom  and  Saun  (1993)  successfully  apply 
a  ratio  of  nonparametric  probability  models  to  applications  in  particle  physics. 

An  example  of  nonparametric  discrimination  is  shown  in  Figire  1.  The  plot  on  the  right  shows  decision 
boundaries  based  on  a  bivariate  product  kernel  estimator  of  the  distributions  of  the  earthquakes  and  the 
NUDETs.  New  events  are  classified  according  to  which  model  yields  a  higher  density  vadue  at  the  new  point. 
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Figure  1:  Example  of  nearest  neighbor  (left  plot)  and  kernel  (right  plot)  discrimination. 


X.  Note  that  the  boundaries  are  highly  nonlinear,  offering  greater  flexibility.  This  flexibility  cem  become 
increasingly  important  for  situations  with  more  complex  structure  or  as  the  dimensionality  increases. 

The  plot  on  the  left  illustrates  a  discrimination  method  in  which  new  events  are  classified  according  to 
a  Ath  nearest-neighbor  rule  (Fix  and  Hodges,  1951)  with  A  =  1.  Here  a  candidate  event  is  classified  to  the 
group  in  which  the  nearest  point  to  that  event  belongs.  The  boundjiries  in  this  case  are  highly  irregular 
due  to  the  lack  of  “smoothing” .  This  single  nearest-neighbor  method  represents  an  extreme  case,  when  no 
smoothing  is  performed.  The  decision  boundairy  for  the  Ath  nearest-neighbor  rule  will  “smooth”  as  the  value 
of  A  increases. 

lYee-Based  Methods 

Binary  tree  methods  represent  an  important  improvement  over  some  of  the  basic  methods  of  statistical 
discrimination  and  in  the  use  of  standard  linear  and  additive  models  for  classification  problems.  First  and 
foremost,  bineury  tree  methods  can  incorporate  both  numeric  and  categoriced  discriminants.  Complicated 
discrimineint  behavior  can  also  be  modeled  easily.  Furthermore,  bineury  tree  methods  are  conceptually  simple 
and  yield  a  nice  graphical  representation  of  the  final  decision  tree  emd  the  resulting  classifications.  This  is 
especially  valuable  when  dealing  multi-dimensional  data.  For  Em  overview  of  the  theory  and  methodology, 
see  Breiman  et.al.  (1984).  Artificial  Neural  Networks  (ANN)  jdso  have  these  features,  however  binary  tree 
methods  have  advantages  over  ANN  in  seismic  discrimination  applications.  For  a  comparison  of  binary  trees 
and  neural  networks  see  Blough  and  Anderson(1994). 

Binary  tree  methods  are  based  on  the  notion  of  recursive  partitioning.  To  illustrate,  consider  again 
the  notion  of  a  vector  of  discriminants,  x,  as  a  distinct  point  lying  in  a  multivariate  space.  To  build  a 
binary  decision  tree,  the  discrimination  edgorithm  recursively  divides  this  multivariate  space  into  smaller 
and  smaller  subregions.  This  dividing  process  is  based  on  the  training  sample,  and  continues  until  each 
subregion  is  homogeneous  with  respect  to  one  of  the  sources. 

This  tree-growing  process  leads  to  a  large  number  of  regions  and  can  overfit  by  becoming  overly  represen¬ 
tative  of  the  training  sample.  To  prevent  overfitting  and  extreme  complexity,  a  tree  is  grown  and  then  pruned 
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Figure  2:  Binary  tree  example. 


or  shrunk  to  a  more  manageable  size.  This  pruning  is  generally  done  by  removing  the  least  important  splits, 
based  on  a  cost-complexity  measure  that  is  designed  to  balance  the  homogeneity  of  the  final  regions  and  the 
complexity  of  the  tree.  This  pruning  can  be  thought  of  as  combining  adjacent  regions  of  the  multivariate 
space  that  are  very  much  alike. 

Graphically,  the  splitting  of  the  discriminant  space  into  regions  can  be  displayed  as  a  binary  tree.  An 
example  is  shown  in  Figure  2,  using  the  data  from  Walter,  Mayeda,  and  Patton  (1995).  The  figure  on 
the  left  shows  a  tree  that  has  been  grown  and  then  pruned  using  a  cost-complexity  function  as  discussed 
in  the  previous  paragraph.  The  fined  partitioning  of  the  discriminant  speu:e  is  shown  in  the  figure  on  the 
right.  In  this  example,  59  earthquakes  were  correctly  identified  as  eeuthquakes  emd  1  nuclear  detonation 
was  incorrectly  identified  as  an  earthquake.  The  counts  at  the  “ND”  branches  of  the  tree  are  interpreted 
similarly. 

Logistic  Regression 

Standard  linear  regression  techniques  have  also  been  used  extensively  for  discrimination  tasks.  The  procedure 
is  to  model  the  responses,  in  this  case  a  dummy  variable  taking  on  vedues  of  0  or  1  depending  on  the 
candidate  event  being  an  earthquake  or  NUDET,  as  a  line2ir  function  of  the  predictor  variables,  in  this  case 
the  discriminants.  In  the  simplest  form,  the  model  is  written  as 

^  P  (2) 

1  -p 

where  p  is  the  probability  of  NUDET  as  calculated  using  the  training  sample  and  xi  is  a  discriminant 
measured  from  a  waveform.  The  parameters  /?o  and  /3i  are  estimated  using  an  iterative  maximum  likelihood 
procedure.  The  form  of  the  left  hand  side  of  Equation  (2)  is  used  to  ensure  that  predicted  values  for  p  aure  bona 
fide  probabilities,  i.e.,  that  they  lie  in  the  interval  [0, 1].  Once  the  pareuneters  are  estimated  from  a  training 
sample,  a  new  candidate  event  is  classified  as  a  NUDET  if  the  estimated  probability  is  greater  than  0.5. 
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Figure  3:  Logistic  regression  example.  Dotted  line  represents  linear  discrimination. 


Note  that  the  model  given  by  Equation  (2)  can  easily  be  extended  to  include  multiple  discriminetnts  as  well 
as  interaction  terms.  Furthermore,  categorical  variables  can  aJso  be  included  as  variables  in  the  model.  An 
example  is  shown  in  Figure  3,  again  using  the  data  collected  by  Walter,  Mayeda,  and  Patton  (1995).  The  solid 
line  represents  the  partition  of  the  discriminant  space  based  on  the  logistic  regression  approach.  The  dotted 
line  represents  the  partition  based  on  linear  discrimination,  2ind  is  shown  for  comparison.  The  two  methods 
are  quite  similar  since  the  fitted  logistic  model,  in  this  case,  included  no  interaction  terms.  Logistic  regression 
methods  are  easily  extended  to  the  CTBT  problem  of  discriminating  between  eeirthquakes,  NUDETs  and 
commercial  explosions. 


Recommendations  and  Future  Plans 

Research  to  address  the  statistical  issues  discussed  above  is  jm  integral  peurt  of  the  DOE  CTBT  R  &  D 
program  (DOE,  1994).  In  collaboration  with  AFTAC,  PNL  plans  to  resewch  and  resolve  the  statistical 
issues  associated  with  regional  discriipination.  This  reseeurch  is  comprised  of  two  major  issues.  First,  the 
operational  capabilities  of  a  CTBT  monitoring  system  must  be  fully  understood  when  monitoring  regions 
of  interest.  Realistic  training  samples  can  be  used  accurately  assess  misclassihcation  rates  and  uncertainties 
in  a  discrimination  algorithm.  Developing  appropriate  regional  training  samples  is  critical  to  the  regional 
discrimination  problem.  Second,  the  statistical  discrimination  discussed  above  cem  be  synergistically  used 
by  AFTAC  as  ancillary  or  corroborative  evidence  when  identifying  the  source  of  a  seismic  event.  PNL  will 
continue  research  on  statistical  discrimination  methods  as  a  collaborative  effort  with  AFTAC  and  Sandia 
National  Laboratory. 
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